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Abstract 

This paper considers top-fc search of the following style. Let S be a set of elements, each associated 
with a real-valued score. Let Q be a (possibly infinite) set of predicates. Given a predicate q € Q and an 
integer k > 1, a top-k query reports the k elements in S(q) with the highest scores, where S(q) denotes 
the set of elements in S satisfying q. The objective is to store S in a structure so that queries can be 
answered efficiently. 

We present a general technique that reduces top-k search to three related problems: (i) counting: 
finding the size of S(q), (ii) (essentially) max-search: finding the maximum score of the elements in 
S(q), and (iii) r-reporting: finding all the elements in S{q) whose scores are at least a value r given at 
query time. For a number of fundamental top-fc problems, their counting, max-search, and r-reporting 
variants have already been settled previously. Our technique immediately leads to Las Vegas structures 
for solving those top-fc problems with good expected efficiency. 

As a second step, we improve the query performance for the top-k range reporting problem. Specif- 
ically, the elements in S are one-dimensional points. Given a range q = [0:1,2:2] and an integer fc > 1, a 
query returns the fc points in S n q with the highest scores. In external memory, we give a linear-size Las 
Vegas structure that answers a query in 0(lg B n + k/B) I/Os and can be updated in 0(lg B n) amortized 
I/Os per insertion and deletion, where n — \S\ and B is the block size. The space cost is in the worst 
case, whereas the query and update time holds with high probability and in expectation simultaneously. 
To obtain this result, we develop a technique that combines random sampling (for handling large fc) and 
tabulation (for small fc), which we believe is of independent interest. 



1 Introduction 



In a typical reporting problem, a query returns all the input elements satisfying a certain predicate. In 
some applications, elements have priorities such that only the most important ones should be reported. For 
instance, a user looking for a hotel may request "the k best-rated hotels whose prices are between 100 and 
200 dollars per night", where k depends on her/his preference and is given only at run time. The request 
can be entertained by first finding all the hotels fulfilling the price condition, and then fetching the best k. 
Apparently, this approach has the drawback of incurring long response time when many hotels pass the price 
check. Ideally, we should be able to find the top-A; hotels directly without wasting time extracting the rest. 
Similar situations have been described in many different areas, including database ll27l . networking |[2TI . 
web search |[25l . to mention just a few. 

Motivated by this, we study top-k search of the following style. Let 5 be a set of elements, each 
associated with a distinct real-valued score. A query is given a predicate q drawn from a (perhaps infinite) 
set Q, and an integer k > 1. It reports the k elements in S(q) with the highest scores, where S(q) denotes 
the set of elements in S satisfying q. If \S(q)\ < k, the entire S(q) is returned. The objective is to store S in 
a structure so that queries can be answered efficiently. 

Models and conventions. Our discussion will assume the external memory model (a.k.a. the I/O model) 
[|4]]. Let B be the size (in words) of a disk block. A word is assumed to have f2(lgn) bits, where n is 
the underlying problem's input size. Space is measured by the number of blocks occupied by a structure, 
whereas time is by the number of I/Os performed by an algorithm. Compared to word-RAM, the external 
memory model as the "advantage" of free CPU calculation. We will not abuse the advantage, so that all our 
results hold directly in word-RAM with B set to an appropriate constant. 

A complexity holds in the worst case unless otherwise stated. A logarithm \g b x is defined as 
max{l, log fe x}, and has base b = 2 by default. The input size n is at least B. Linear cost is under- 
stood as 0{n/B) whereas logarithmic cost is 0(lg B n). With high probability (henceforth, w.h.p.) refers to 
a probability at least 1 — 1/n 2 , where the exponent 2 can be replaced with any positive constant. All lemmas 
and theorems hold when n is greater than a sufficiently large constant. 

1.1 Our results 

Top-k search has some natural companion problems, which assume the same input S and predicate set Q, 
but differ in query formulation. In the first companion, the counting problem, a query is given a predicate 
q G Q, and returns \S(q)\ . In the second, the top-constant-score problem, a query is given q G Q, and returns 
the c-th highest score of the elements in S(q), where c is a constant fixed for all queries. If \S(q)\ < c, the 
query returns oo. In the third companion, the t -reporting problem, a query is given q G Q and a real value 
r, and reports the elements in S(q) with scores at least r. 

We show that top-k search is a combination of the above three companions in disguise, as far as expected 
efficiency is concerned. Specifically, we give a general technique to reduce top-k search to its companions 
with no performance deterioration, i.e., the (expected) space, query and update cost of the resulting top-A; 
structure is determined by the most expensive of the companion structures. The technique gives rise to 
elegant Las Vegas structures for the top-k version of several fundamental problems: 

Top-k range reporting. In this problem, S is a set of points in the real domain R. Given an interval 
q = [x±, X2] in R and an integer k, a query reports the k points with the highest scores in S(q) = S n q. 
Our structure uses linear space (expected and w.h.p.), answers a query in 0(lg B n + k/B) expected 
I/Os, and supports an insertion and deletion in 0(lg B n) amortized I/Os (expected and w.h.p.). This 
is the first structure to achieve logarithmic query and update time in external memory (similar results 
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were known in internal memory nearly two decades ago iTOll ). Previously, the best structure E71 
has linear space and 0(lg B n + k/B) query cost, but requires 0(lg B n) amortized I/Os to handle an 
update. 

Top-k stabbing. S is a set of intervals in E. Given a value g £ M and an integer k, a query reports the k 
intervals with the highest scores among the intervals of S containing q. Our structure uses linear space 
(expected and w.h.p.), answers a query in 0(lg B n + k/B) expected I/Os, and supports an insertion 
and deletion in 0(lg B n) amortized I/Os (expected and w.h.p.). We are not aware of any previous 
result on this problem. For k = 1, the problem is known as stabbing max, for which Agarwal et al. 
ll3l presented a linear size structure with logarithmic query and amortized update time. 

Top-k 3-sided range search. S is a set of points in M 2 . Given a rectangle q = [x\, X2] x [y, 00) and an 
integer k, a query reports the k points with the highest scores in S n q. We obtain a static structure 
that uses 0(j| lg i|" - ) space (expected and w.h.p.) and answers a query in 0(lg B n + k/B) expected 
I/Os. The space-query tradeoff matches the best achievable by any deterministic structure. We again 
are not aware of any previous result. 

It is perhaps remarkable how effortlessly we develop the structures from our reduction, because none of 
the three problems appears to be trivial at all. By contrast, the previous structures (2J |27j for top-/c range 
reporting are much more involved. 

The query efficiency of our reduction does not necessarily hold w.h.p. To make that happen, special 
effort is needed to hack into the black box. In this paper, we demonstrate so for top-k range reporting: our 
ultimate structure uses 0(n/B) space (worst case), answers a query in 0(lg B n) I/Os (expected and w.h.p.), 
and can be updated in 0(lg B n) amortized I/Os (expected and w.h.p.). For k = 0(polylg n), the update and 
query time holds in the worst case. 

1.2 Techniques 

Given elements e\ and e2 from an ordered domain, we say that e\ is smaller than e2 if e\ precedes e2', 
otherwise, e\ is greater than e-z. Let L be a set of elements from an ordered domain. Given an element 
e, define its rank in L as the number of elements in L at least e (i.e., the largest element of L has rank 1). 
This work initiated from a rudimentary observation. Take a sample set R of L by including each element 
independently with probability p. Suppose that we want to find the element with rank k in L. Then, the 
element with rank roughly kp in R should be quite close to what we are looking for. 

The proposed reduction results from setting kp to an appropriate constant in the above "rank sampling" 
observation. This enables us to retrieve just the top few elements in R, and use their scores to search for the 
real top-fc elements. The challenge is to account for the error of sampling. We show how to keep the error in 
check by preparing multiple sample sets with exponentially decaying sampling rates, and a doubling trick 
in query processing. 

We now turn attention to ensuring good query efficiency w.h.p. for top-fc range reporting. Unfortunately, 
rank sampling falls short for this purpose: its reliance on statistical significance prevents it from working on 
small k. Following |[27l . we instead target a different problem: 

approximate range k-selection. Let S be as defined in top-/c range reporting. Given an interval q = \x\, X2] 
in M and an integer k satisfying 1 < k < \S D q\, a query returns a point e € S n q such that at least 
k but less than ck points in S n q have higher scores than e, where c > 1 is a constant fixed for all 
queries. 



2 



Suppose that a linear-size structure solves the above problem under an arbitrary c with query time t q and 
update time t u . This directly implies a linear-size structure for top-/c range reporting with query time t q + 
0(lg B n + k/B) and update time t u + 0(lg B n) (271 . Indeed, the main contribution of (271 is a structure 
with t q = 0(lg B ra) and t u = 0(lg 2 B n). Our goal is to lower t u to 0(\g B n). 

Rank sampling again lends a helping hand: it allows us to exploit an 0(polylgre)-update time structure 
to handle queries with large k. Intuitively, we can set the sampling rate p to roughly 1 / polylg n, and then 
apply the structure on the sample set. Since the number of updates has been decreased by polylg n times, the 
update cost is brought down to 0(1) amortized. Combining the idea with the structure of E71 . we achieve 
query time 0(lg B n) and update time 0(lg B n) w.h.p. for k = £l(lg 2 n). 

The sampling technique no longer works for k = 0(lg 2 n), whose handling eventually boils down to 
another problem: 

approximate (f,l)-group k-selection. An (f,l)-group G is a list of / disjoint sets G\,...,Gf, each of 
which has at most / elements drawn from the same ordered domain. Given an interval q = [a±, 02] 
with 1 < a\ < 012 < f and an integer k with 1 < k < \ {J ieq Gi\, a query returns an element whose 
rank in [j ieq C« f ans i n c ^)' where c > 1 is a constant fixed for all queries. 

The difficulty is to solve the problem when f = \g e n for some positive e < 1, I = O(polylgn), and 
lg lg n > \[~B. We observe that under those constraints, the problem can be dealt with using tabulation. 
Specifically, we precompute a table of o(n) bits so that any instance of the problem can be represented 
in a compressed form that guarantees query time 0(lg B (//)). Furthermore, the table even captures all 
the transitions from one compressed form to another due to an update in any of the sets in G. In this 
way, we support an update in 0(lg B (fl)) I/Os, which in turn gives a desired deterministic structure for 
approximate range fc-selection with k = O (polylg n). Our approach abandons the indivisibility assumption, 
and demonstrates new power in fully leveraging all the bits available for storage. Even though eliminating 
indivisibility has prevailed the word-RAM model, research of this form has emerged only recently in external 
memory |fEfll22l . 

At a high level, our solution presents a framework for designing top- A; structures with logarithmic update 
time w.h.p. First, obtain a structure that is query efficient but incurs 0(polylg n) update time. Then, applying 
rank sampling, we use this structure to handle large k = r2(polylg?7,), and tackle the case k = O(polylgn) 
by combining approximate rank selection with our compression technique. We believe that this framework 
is of independent interest. 

Remark. Using random sampling to approximate ranks is not new. This was a main idea behind many 
classic results whose complete coverage is far beyond this paper (see, e.g., (71 [TOl [U [201). What remains 
non-trivial is how to employ it with other observations to obtain an elegant reduction for generic top-fc 
search. Moreover, how we combine sampling with tabulation in designing external memory structures is 
also new to our best knowledge. 

1.3 Previous results 

Top-fc range reporting was first studied by Afshani, Brodal and Zeh [2], who gave a static linear-size structure 
with 0(lg B n + k/B) query time. They also analyzed the space-query tradeoff for an ordered variant of 
the problem, where the top-fc elements need to be reported in the order of their scores (no order requirement 
exists in our problems). As pointed out in (271 , the lower bound of suggests that, when space usage 
must be linear, a simple approach already achieves near-optimal query efficiency: first solve the unordered 
version in 0(lg B n + k/B) I/Os, and then sort the result elements. For the unordered version, Cheng and 
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Tao ll27l proposed a dynamic structure that matches the space and query cost of 0, but can also be updated 
in 0(lg% n) amortized I/Os per insertion and deletion. 

In RAM, by combining a priority search tree ll23l and Frederickson's selection algorithm |[T3l on min- 
heaps, one can obtain a structure that uses 0(n) words, answers a query in 0(\gn + A;) time, and supports 
an insertion and a deletion in O(lgn) time. It is unclear whether the structure can be adapted to work 
efficiently in external memory (straightforward adaptation results in 0(lg?7, + k) query time, instead of 
0{\g B n + k/B)). Brodal, Fagerberg, Greve and Lopez-Ortiz Q considered a special instance of the 
problem where the input points of S are from the domain [n]Q They gave a linear-size structure with 
0(1 + k) query time (which holds also for the ordered version). Note that, by fixing k = 1, the problem 
studied in [9] specializes into the well-known range minimization query problem ifTTl . 

We are not aware of existing results for top-A; stabbing and 3-sided range search. In ll22l . Larsen and Pagh 
mentioned an application of top-A; 3-sided range search: a solution to this problem also settles a document 
selection problem called top-k colored prefix reporting. They presented a structure in the scatter I/O model, 
where an I/O may read/write B arbitrary words in the disk (i.e., these words need not be consecutive). Our 
top-A: 3-sided structure is the first one in the traditional external memory model solving the top-A; colored 
prefix reporting problem with logarithmic query cost plus linear output time. 

The exact version of approximate range A;-selection (i.e., a query should return the point with precisely 
the specified rank) has been studied in internal memory as the range median problem. Currently, the best 
dynamic structure lfl2l uses 0(n) space, answers a query and supports an update in 0({ ^Q n ) 2 ) time. This 
structure does not suit our needs because we aim at logarithmic query and update time, not to mention that 
this structure remains to be externalized. 

2 A reduction technique for general top-A: search 

This section will describe a framework of designing index structures for top-A; search with attractive expected 
efficiency. We will then apply the framework to solve the three instances of top-A; search mentioned in 
Section O 

2.1 Rank sampling 

Let L be a set of elements. We say that R is a p-sample set of L if R is obtained by independently including 
each element of L with probability p. 

Lemma 1. Let R be a p-sample set of L. Let k > 1 and 5 £ (0, 1) satisfy kp > 3 In |. If \L\ > 4k, the 
following hold simultaneously with probability at least 1 — 5: 

- \R\ > 2kp, and 

— the element with rank \2kp~] in R has rank between k and 4k in L. 
All the omitted proofs (such as the above one) can be found in the appendix. 

2.2 Reduction 

We now present a reduction from top-A; search to counting, top-constant-score search, and r-reporting. Let 
S be the input set of the top-A; problem, and n = \S\. For each i £ [0, [lgraj], take a pj-sample set Si of 
S, where pi = min{l, ~ ■ 3 In 9}. Index each Si with a structure for answering top- [6 In 9] -score queries. 

'Given an integer x > 1, [x] represents the set of integers between and x — 1. 
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Finally, store S in two structures for answering count and r-reporting queries, respectively. The r-reporting 
structure needs to be worst-case query efficient. Let T rep + c-t/Bbe. the worst-case time required to answer 
a r-reporting query on a set of n elements, where c > is a constant, and t is the number of elements 
reported. 

Let us see how to answer a top-k query with predicate q G Q. If k < 3 In 9, run a top-fc-score query 
on Sq (= S) to obtain a score s, and then report the result of an s-reporting query on S. If n/2 < k < n, 
scan the entire S to retrieve S(q), i.e., the set of elements in S qualifying q. Then, perform A;-selection (in 
linear time |8]) on the elements in S(q) to find the k-th highest score s of the elements there, and report all 
elements in S(q) whose scores are at least s. 

For k < n/2, the query algorithm first runs a count query to obtain The rest of the algorithm 

executes in rounds with a parameter k' , where k' is set to the smallest power of 2 at least k in the first 
round, and doubles each time a new round begins. In each round, if \S(q)\ < Ak', retrieve S(q) with an 
oo-reporting query on S with predicate q, find the top-k elements from S(q) by /c-selection, and finish the 
entire algorithm. Otherwise, perform a top- [6 In 9] -score query on S\ g w ; let s be the score returned by this 
query. Run an s-reporting query on S with predicate q in a cost-monitoring manner: force the query to 
terminate as soon as its cost reaches T rep + c • Ak' /B - if the query has not terminated normally at this 
point, we say that it has aborted. If the query terminated normally (i.e., it did not abort), let S' be the set of 
elements retrieved. If |5'| > k, finish the algorithm by finding the top-k elements in S' with fc-selection. If 
1 5' | < A; or the s-reporting query aborted, double k', and perform another round with the new k'. 

If the counting, top- [6 In 9] -score and r-reporting structures all support updates, they can be maintained 
in a straightforward manner along with the updates on S, so that our overall top-k structure is dynamic. 
In particular, an update concerns the top- [6 In 9] -score structure on Si if and only if the element being 
inserted/deleted is sampled in Si. For each element of S, we can easily keep track of the sample sets it 
belongs to with hashing. 

Next, we analyze the performance guarantees of our reduction. For this purpose, suppose that, on a 
set of n elements, a count, top- [6 In 9] -score, and r-reporting query can be answered in T cnt , T topc , and 
(as mentioned before) T rep + 0(t/B) time, respectively. Furthermore, suppose that the structures for those 
queries can be updated in U cn t, Utopc and U rep amortized time, respectively (if a structure is static, its update 
time is oo). We have: 

Theorem 1. The structure obtained with our reduction correctly answers every top-k query with T cn t + 
O (Tto P c + T rep + k/B) I/Os in expectation. It supports any sequence of n updates on an initially empty 
input with n(U cn t + U rep + 0(Ut op c)) VOs both in expectation and w.h.p. 

Proof. We prove only the query cost, and leave the update time to the appendix. It is obvious that: (i) for 
k < 3 In 9, the query cost is T topc + T rep + 0(l), and (ii) for k > n/2, the query cost is 0(n/B) = 0(k/B). 
Next, we assume 31n9<A;<n/2. Set c = [6 In 9] in the rest of the proof. 

Consider the execution of a round with parameter k' . If \S(q) \ < Ak! , the round performs T rep +0(k' / B) 
I/Os. Otherwise, we apply Lemma[Uwith 5 = 1/3: since k'p\ g k' = mm {^'' ^'fijrF} = min{/c', 3 In 9} > 
3 In |, we know that with probability at least 2/3, (i) Si has at least \2k'p\ g i ; i] > c elements qualifying q, 
and (ii) the element in Si with the c-th largest score has score rank between k' and Ak' in S(q). When both 
conditions hold, the algorithm issues a r-reporting query that returns at most Ak' elements, and terminates 
after ^-selection. In this case, the cost of this round is at most T topc + T rep + 0(k' / B). The algorithm 
does not finish at this round with probability at most 1/3 due to two possibilities: the r-reporting query 
on S fetched less than k! objects, or this query aborted. For either possibility, this round performs at most 
Ttopc + T rep + 0(k'/B) I/Os. 
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In summary, in the i-th (i > 1) round, the algorithm terminates with probability at least 2/3, and goes 
into the next round with the remaining probability. In any case, a round performs T topc + T rep + O (2 l ■ k/B) 
I/Os (k < k' < 2 l ■ k in the 2-th round). Due to the independence of all the sample sets, the i-th round is 
executed with probability at most (1/3)* -1 . The expected query cost is therefore at most 

oo 

T cn ,t + 

(1/3)^ {Ttopc + T rep + 0(T ■ k/B)) = T cnt + 0(T topc + T rep + k/B). 

i=i 

It can be easily verified that the query algorithm is Las Vegas. □ 

Remark. Top-constant-score queries are seldom specifically studied in the literature. A related, more 
popular, topic is the max-score problem, namely, top- 1 -score search. Extending a max-score structure into 
a top-constant-score one is often effortless. In particular, if the max-score structure supports fast updates, 
a simple way of answering a top-c-score query is to repeat the following c times: issue a max-score query, 
retrieve the element of the maximum score (recall that all elements have distinct scores), and delete the 
element from the structure. After the query, insert the c deleted elements back into the structure. 

2.3 Applications 

We now apply Theorem[T]to solve the top-A; problems mentioned in Section [TTT1 

Corollary 1. For top-k range reporting, there is a structure that uses linear space (expected and w.h.p.), 
answers a query in 0(lg B n+k/B) expected I/Os, and can be updated in 0(lg B n) amortized I/O s (expected 
and w.h.p.). The same is true for top-k stabbing. For top-k 3-sided range search, there is a structure using 
0(|| igig^ n ) space (expected and w.h.p.), and answering a query in 0(lg B n + k/B) expected I/O s. 

Our (randomized) structure for top-/c 3-sided range search achieves the best space-query tradeoff one 
can hope for with any deterministic structure. 

Theorem 2. Under the indexability model of [18], any deterministic structure for top-k 3-sided range search 
must use i g i|^ n ) space if it guarantees query time 0(lg B n + k/B) for any constant c. 

3 Improved structure for top-fc range reporting 

In this section, we present a better structure for top-k range reporting, which achieves 0(\g B n + k/B) 
query time w.h.p. Recall that the input S is a set of points in R, each of which is associated with a score in 
M. Given an interval q = [xi, X2] in R and an integer k > 1, a query returns the k elements in S(q) = S Hq 
with the highest scores. We will assume \S(q)\ > Ak because otherwise we can retrieve the entire S(q) 
from a B-tree on S, and apply ^-selection. Unless otherwise stated, in a B-tree, a leaf node stores at most B 
elements, and an internal node has at most B child nodes. 

3.1 Handling k = 17 (lg n • lg B n) 

Maintain a p-sample set R of the input set S, where p = 1/ lg B n. Store R in an approximate range k- 
selection structure T se i of [27]. Furthermore, index S and R respectively with an augmented B-tree for 
answering count queries, i.e., given q = [21,2:2], return \S(q)\ and where R(q) = RPiq. Index S 

with another B-tree that allows us to retrieve efficiently all the points in S(q), and also an external priority 
search tree T pr i or of to support r-reporting queries on S. 
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Given a top-fc query with q = [x\, X2] and k > 7 Inn ■ lg B n, first check whether |i?(<?)| > 2kp. If yes, 
perform approximate rank /c'-selection with q = [x\, X2] and k' = \2kp~] on T se i; let s be the score of the 
retrieved element. Otherwise (i.e., |-R(g)| < 2kp), set s = 00 directly. Use T pr i or to retrieve all the points of 
S n q whose scores are at least s. If at least k elements are fetched, find the top-A; elements with fc-selection. 
Otherwise (i.e., less than k elements are fetched), retrieve the entire S(q) and find the top-k elements with 
fc-selection. 

Lemma 2. For top-k range reporting with k > 7 Inn • \g B n, there is a structure that uses 0(n/B) space 
in the worst case, correctly answers every query with 0(lg B n + k/B) I/Os in expectation and w.h.p., and 
supports n updates on an initially empty input with totally 0(n lg B n) I/Os in expectation and w.h.p. 

3.2 The approximate (/, Z)-group fc-selection problem 

In this section, we give a dynamic structure solving the approximate (/, /)-group ^-selection problem 
(henceforth, the (/, l)-problem) defined in Section fOl when / = lg e n, I = 0(polylg n), and lg lg n > \f~B. 
Abusing notation slightly, we use G also to denote the union of G\, Gt . 

Logarithmic sketch. We first review the logarithmic sketch (henceforth, sketch) developed in (271 • Let L 
be a set of I ordered elements. Its sketch E is an array of size [l°g2 + 1> where the j-th (j > 1) entry E[j] 
is an element in L whose rank in L falls in [2 J_1 , 2?) (any such element can be used for T>{j]). We refer to 
£ [j] as a pivot. The following is from |[27l : 

Lemma 3 ( H2710 . Let L\, L m be m disjoint sets of elements drawn from an ordered domain. Given their 
sketches and an integer k satisfying 1 < k < | (J&Li ^»l> we can find in 0(m) I/Os an element e € UIHi 
whose rank in IJ^Li at least k but less than ckfor some constant c > 1. 

Static structure. We now describe a static structure for the (/, /)-problem. Create a sketch for each G{ 
(1 < i < /)• We force Ej to have [log2 + 1 pivots: if \G{\ is less than I, the last few pivots in £j are 
dummy. Call the set {Ei, Sy^} a sketch set. 

We store a compressed form of the sketch set as follows. We describe each pivot e € Ej by its global 
rank in G using lg(/Z) bits. Also, we associate e with its local rank in Gj, which can be described in lg / bits. 
We use the same number 0(lg(//)) of bits for each pivot, even for a dummy one. Hence, each Ej occupies 
0(lg I ■ lg(/0) bits; and thus, a compressed sketch set occupies 0(f lg I • lg(fl)) = 0(lg e n • (lg lg n) 2 ) bits 
- note that this is less than the length of a word. There are at most h = 2°( lgE n '( 1 s 1 s n ) ) possible compressed 
sketch sets. 

Besides an integer k G [1, fl], a query for the (/, Z)-problem specifies a range [at, 02] such that 1 < 
oi\ < 0.2 < /■ Hence, there are less than f 2 ■ fl = f 3 l possible queries, each of which can be described in 
0(lg(/Z)) bits. For each possible combination of (compressed sketch set, query), we compute in 0(f) I/Os 
the query answer (using Lemma©, which is an element in G, and described by its global rank in 0(lg(/Z)) 
bits. Create a query lookup table T qry where each entry corresponds to a (compressed sketch set, query) 
pair, and stores the query answer in its global rank. All entries are arranged in the lexicographic order of 
the bit-description - referred to as a bit-string - of the corresponding (compressed sketch set, query) pair to 
allow constant-time lookups. T qry occupies 

0(h ■ fl • lg(//)) = 2°( 1 s en '( 1 s 1 s n ) 2 ) • O(polylgn) = o(n) 

bits, and can be computed in 0(h • f 3 l • /) = 0(h ■ polylgn) = o(n) I/Os. 

Given a query, we append its bit-string to that of the current compressed sketch set to acquire the cor- 
responding entry of T qry in constant time. Remember that the query answer is given in global rank. We 
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therefore index all the elements of G with a B-tree, so that we can convert a global rank to an actual element 
in O(lg B (/0) I/Os. The B-tree occupies O(flfB) spacdE 

Update. Now we make our structure dynamic. First, create a B-tree on the elements of each Gj, and 
another B-tree on the (uncompressed) pivots of each Ej. Unlike a compressed sketch, the B-tree on Ej does 
not contain the global and local ranks of the pivots. At all times, we let each pivot in (the B-tree of) Ej 
and its copy in (the B-tree of) Gj keep a pointer to each other. Later, we need to scan the elements of Gi 
between two pivots. Those pointers allow us to do so in 0(lg B Igl + t/B) I/Os if t elements are scanned. 
During a node split/merge in a relevant B-tree, such pointers are properly maintained using O(B) I/Os. As 
0(.B) updates must have taken place in the node being split/merged, the 0(B) cost can be amortized so 
that an update bears only O(l) I/Os. In the sequel, we sometimes need to add or remove the last pivot of 
some Ej. This can be done in the compressed form of Ej and the B-tree on Ej in 0(1) and 0(lg B lg I) I/Os, 
respectively. 

Suppose that an element e new is to be inserted in Gi for some i G [1, /]. Let r new be the rank of e new 
in G before the insertion. Except perhaps a single pivot, the new compressed sketch set (after the update) 
can be deduced from: the current compressed sketch set, r new and i. To understand this, consider first a 
compressed sketch Ej/ where i' ^ i. Each pivot whose global rank is at least r new now has its global rank 
increased by 1 (its local rank is unaffected). Regarding the compressed Ej, the same is true, but additionally 
every such pivot should also have its local rank increased by 1. Furthermore, a new pivot is needed in Ej if 
\Gi\ reaches a power of 2 after the insertion - in such a case we say that Ej expands; the new pivot is the 
only one in the compressed sketch set that cannot be deduced (because its global rank is unknown). 

Motivated by this observation, we precompute another table Tj ns called the insertion lookup table. Let 
E denote a possible compressed sketch set. Tj ns has an entry for every possible combination of (E, r new ,i), 
where r new and i are as explained before. This entry contains the new compressed sketch set determined 
by (E, r new , i), excluding the new pivot if Ej needs to expand. Tj ns has h ■ fl • / entries, whereas each 
entry occupies 0(lg e n • (lglgn) 2 ) bits, i.e., the length of a compressed sketch set. We store the entries in 
the lexicographic order of the bit-strings of the corresponding (E, r new ,i) to allow constant-time lookups. 
Overall, Tj ns occupies 0(h • polylgn) = o(n) bits. 

Recall that, in a logarithmic sketch, the j-th pivot should have its local rank confined to [2? -1 , 2 J ). The 
pivot is invalidated when its local rank equals 2 J_1 — 1 or 2 J . When this happens, as in (271, we re-compute 
the pivot to be the element with local rank |_§ ■ 2- J_1 J so that 0(2 J ) updates in Gj are needed for the new pivot 
to be invalidated. The only exception is when the pivot has local rank 2- 7 " 1 — 1 and is the last one in its sketch 
set, but this situation happens only in deletion, which will be discussed later. Pivot re-computation must be 
done online because we cannot predict the new pivot's global rank in advance. Therefore, in each entry 
of Ti ns , we associate the new compressed sketch set E' stored there with a pointer to a linked list, which 
indicates all the invalidated pivots in E'. As each pivot can be described by two 0(lg(//))-bit integers (i.e. 
which Ej contains it, and its rank in Ej), all the linked lists occupy 0((h •//•/)• (fl • lg(/Z))) = o(n) bits. 
For each (E, r new ,i), we can easily compute E', as well as all the invalidated pivots in E', in O(polylgn) 
time, implying that the Tj ns can be generated in o(n) I/Os. 

We are ready to clarify the full procedure of inserting e new in Gj. After obtaining r new from the B-tree 
of G in 0(lg B (fl)) I/Os, we lookup Tj ns in constant I/Os (note: (E, r new ,i) can be described by o(lgn) 

2 If one aims at designing only an external memory structure, the query lookup table is unnecessary. We mentioned earlier that 
a sketch set can be stored within a single word, which can be loaded into memory with one I/O. Then, we can apply Lemma[3]to 
answer a query in memory, for which purpose the algorithm of |27| requires only O(l) words of memory, and incurs no cost in 
our context (recall that CPU time is for free). This approach, however, does not work in word-RAM, where a penalty of O(f) time 
applies to each query. The above tabulation approach, on the other hand, allows us to handle a query in 0(lg(/7)) time even in 
word-RAM. 
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bits) to obtain the new compressed sketch set, as well as the list of invalidated pivots, if any. If now \Gi\ is a 
power of 2, we retrieve the global rank of the smallest element e 6 Gi in 0(lg B (fl)) I/Os, and add e to Sj 
inO(lg B lgZ) I/Os. 

Next, we deal with the invalidated pivots. For each such pivot (suppose that it is the j-th pivot of £j), 
we retrieve the element e £ Gj with local rank |_§ ■ 2- J_1 J in 0(lg B Igl + 2 J /B) I/Os. This can be done by 
scanning the elements of Gi between £ , [j] and its succeeding or preceding pivot in £j (if the local rank of 
i s currently 2- 7 " 1 — 1 or 2 J , respectively). To acquire the global rank of e, we distinguish two cases: 

• 2 J > B\g B (fl): there are 0(\gl) such j (i.e., the number of pivots in a sketch). We obtain the 
global rank by searching the B-tree on G in 0(lg B (fl)) I/Os, and then update accordingly in 
0(lg B Igl) I/Os. In total, we spend 0(lg B (/Z) + 2^/B) = 0(2 j /B) I/Os to handle this invalidated 
pivot. Since as mentioned earlier S7(2 J ) updates must have occurred in Gi to trigger the invalidity of 
£j[j]> each of those updates accounts for 0(1/ B) I/Os of the invalidation handling. As an update can 
be charged at most 0(lg I) times this way (i.e., once for every j), its amortized cost is increased by 

onlyO(^lg0 = O(lg B 0- 

• 2 J < B \g B (fl): we retrieve the global rank of e in constant time using: 

Lemma 4. In o(n) I/Os, we can precompute several tables of o(n) bits in total. Given an instance 
of the (/ , l)-problem, there is a structure consuming O(flfB) space such that, given any integer r £ 
[1, B \g B (fl)\ and i 6 [1, /], the global rank of the element with rank r in Gi can be retrieved in O(l) I/Os. 
The structure can be updated in Oi\g B (fl)) I/Os per insertion and deletion. Both the query and update 
algorithms need to consult the precomputed tables. The structure can be built in 0{fl ■ lg B (fl)) I/Os. 

Now we can update Sj by using e to replace the original £j[j] in 0(lg B lgl) I/Os. In total, we 
spend 0(lg B \gl + 2^ / B) I/Os to handle the invalidation of this pivot. We only need to worry about 
the term 0{\g B \gl) because the other term 0(2 J /B) can be accounted for in the way explained 
earlier. Note that there are 0(\g(B \g B {fl)) values of j satisfying 2- ? < B \g B {fl). In other words, an 
insertion can trigger the invalidation of 0(lg(-B \g B (//)) such pivots, meaning that the insertion cost is 
increased by 0{\g(B \g B (fl)) ■ lg B lg I). Fitting the values of / and we know 0(lg(B lg B (fl)) = 
0(\gB + \g\g B {fl)) = 0(lglglgn + lglg^lgn) = 0(lglg B lgn) I/Os, where the last equality 
used O(lglglgn) = 0(lglg B lgra) when VB < lglgn. Hence, 0(lg(B lg B (fl)) ■ lg^ lg Z) = 
0(lglg B lgn- lg B lglgn) = 0((lglg B lgn) 2 ) = o(lg B (/Z)). 

Therefore, an insertion can be performed in 0{\g B {fl)) amortized I/Os. A similar technique handles a 
deletion with the same cost (see the proof of the next lemma). Combining our earlier discussion on query 
processing, we have obtained: 

Lemma 5. In o(n) I/Os, we can precompute several tables of o{n) bits in total. Given an instance of 
the (/, l)-problem, there is a structure consuming 0{fl/B) space that (using also the precomputed tables) 
answers a query in 0(\g B (fl)) I/Os, and can be updated in 0(lg B (fl)) I/Os per insertion and deletion. 
The structure can be built in 0(fl ■ lg B (fl)) I/Os. 

We remark that it is unnecessary to improve the o(n) precomputation time to 0(n/B). This is because 
we can re-compute the relevant tables whenever n has changed by a constant factor. The amortized cost per 
update is only o(l), as discussed in detail later. 

3.3 Approximate union-rank selection 

This subsection considers the following approximate union-rank problem. Let L\, ...,L m be m disjoint sets, 
whose elements are drawn from an ordered domain. An algorithm is allowed only two operators to access 
Li (1 < i < m): 
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• max-operator, which fetches the largest element of L, t in cost max time. 

• rank-operator, which takes a real- valued parameter r G [1, |Lj|/c], and returns an element whose 
rank in Lj falls in [r, cr), where c > 2 is a constant. The operator takes cost ran k time. 

Given an integer k with 1 < k < \ min{|Lx|, |£ m |}, the algorithm should return an element whose 
rank in L\ U ... U L m falls in [k, c'k), where c' > 1 is a constant dependent only on c. 

The classic algorithm of Frederickson and Johnson lfl4l is not immediately applicable because it assumes 
a more powerful rank operator that retrieves the element with a precise rank in Lj. We extend their algorithm 
to obtain: 

Lemma 6. The approximate union-rank problem can be solved in 0{m{cost max + cost ran k)) time. 
3.4 Top-fc range reporting for k = 0(polylg n) 

We now give a deterministic structure for approximate range ^-selection with k < I = O(polylgra). We 
discuss only lg lg n > \fB in this subsection, and leave the opposite case to the appendix (see the proof of 
Theorem [3]). 

The base tree of our structure is a weight-balanced B-tree O T on the points of the input set S. Each 
internal node of T has at most / = lg e n child nodes where e < 1 is a constant. Each leaf node stores at 
most b = flB elements. Given a node u, denote by S u the set of elements stored in the subtree of u, and 
by G u the c 2 l elements in S u with the highest scores, where c is the constant c in Lemma|3] For each leaf 
node z, maintain a structure of ll27l for approximate range ^-selection on S z . Consider an internal node u 
with child nodes u\,...,Uf, where the subscripts reflect the ordering of the elements in their subtrees. We 
maintain a (/, c 2 l)-structure for solving the (/, c 2 /)-problem on the (/, c 2 /)-group G u = (G Ul , G Uf ). 
Finally, store the elements of G Ul U ... U G u . in a B-tree so that, given any 1 < a± < 02 < /, we can find 
the maximum score of the elements in U ie r ai « 2 ] using 0(lg B (fl)) I/Os. 

Given a top-/c query with range q = [x±,X2], search T in a standard way to obtain m = Oilg^n) 
canonical subsets that form a partition of S n q. Specifically, a canonical subset is either qPi S z for some 
leaf node z, or unions the elements in the subtrees of several continuous child nodes of an internal node u 
- denote the union as L u , and let A be the set of all such u. We perform approximate rank /c-selection on 
UugA us i n g Lemma[6j notice that the (/, c 2 /) -structure of u and the B-tree on G u allow us to implement 
the rank- and max-operators in 0(lg B (fl)) I/Os, respectively (see Lemma©. Therefore, the approximate 
rank /c-selection finishes in 0(mlg B (fl)) = 0(lg^n • lg B f) = 0(\g B n) I/Os; let e be the element 
returned. For each leaf node z such that q n S z is a canonical subset, perform approximate range /c-selection 
on S z using q in 0(lg# b) = 0(lg B n) I/Os. There are at most two such leaf nodes; let e±, e% be the results 
of the approximate range /c-selection on them, respectively. We return max{e, e\, e^} as the final answer. 

Moving the other standard details to the appendix, we conclude with our last main result: 

Theorem 3. For top-k range reporting, there is a structure that uses 0{n/B) space in the worst case, 
correctly answers every query with 0(}g B n + k/B) I/Os in expectation and w.h.p., and can be updated in 
0(\g B n) I/Os in expectation and w.h.p. For k = O(polylgn), the query and update complexities hold in 
the worst case. 
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Appendix 1: Chernoff bounds 



Let X\, ...,X n be independent Bernoulli variables such that Pr[X; = 1] = p%. Let X = Ya=i m ^ 
H = ELY] = J2i=iPi- It holds that: 

• for any a £ (0, 1): 

Pr[X > (1 + a)/i] < e~ a2 ^ (1) 
Pr[X < (1 - a)n] < e~ a2 ^ (2) 

• for any a > 2: 

Pr[X > a/i] < e~ a/i/6 (3) 

• for any a > 6/x: 

Pr[X > a] < 2" a . (4) 
The above inequalities can be found in many papers and textbooks, e.g., |[T6ll24l . 



Appendix 2: Proof of Lemma Q] 

First, we bound the probability that \R\ < 2kp. For each 1 < i < \L\, let Xi be 1 if the i-th element of L is 
sampled, or otherwise. Let X = £)I=i ^i- Then, E[X] > Akp. Hence: 

Pr[Y < 2kp] = Pr[X < (1/2) • 4fcp] 

< Pr[X < (1/2) • E[X\] 
(by Chernoff bound©) < exp(-Epf]/12). 

To make the above at most 5/3, we need E[X] > 12 ln(3/5), for which it suffices to have kp > 3 ln(3/<5). 

From now on, let e be the element with rank \2kp~] in R, and /c be the rank of e in L. Consider the event 
k > 4k. For each i 6 [1, 4k], let y» be an indicator variable which equals 1 if the i-th greatest element in L 
is sampled, or otherwise. Let Y = Y^t=i Vi> i- e -> ^ me number of samples from the greatest 4k elements 
in L. Clearly, ~E[Y] = 4kp. Event k > 4k implies Y < \2kp~\ — 1. We thus have: 

Pr[fc > 4k] < Pr[y < \2kp] - 1] 

(as Y is an integer) = Pr[Y < 2kp] 

< Pr[Y < (1/2) • E[Y]] 
(by Chernoff bound ©) < exp (-E[Y]/12) 

To make the above at most 5/3, it suffices to have kp > 3 ln(3/5). 

Next, we consider the event k < k. For each i £ [1, A;], let be an indicator variable that is 1 if the i-th 
greatest element in L is sampled, or otherwise. Define Z = J2i=i z %- Thus, E[Z] = fcp. Event k < k 
implies Z > \2kp] . Therefore: 

Pr[k <k] < Pr[Z > 2kp] 

= Pr[Z > 2 ■ E[Z]] 
(by Chernoff bound©) < exp(-E[Z]/3) 

To make the above at most 5/3, we need E[Z] > 31n(3/J). It suffices to make fcp > 31n(3/<5). 



11 



Appendix 3: Completing the proof of Theorem Q] 

It remains to analyze the update cost. We focus on the top-constant-score structures because the update 
time of the counting and r-reporting structures is straightforward. Recall that there is a top-constant-score 
structure on each sample set Si (0 < i < |lg ra J)- Let Xi be the number of insertions performed on Si. 

Set X = Ysf^ x i- We Wl11 show that x = °(. n ) with probability at least 1 - 1/n 2 . This implies 
E[Jf] = 0(n). As there cannot be more deletions on each Xi than insertions, this will complete the proof. 

Set c = 3 In 9. It suffices to consider only i satisfying 2 l > 3 In 9. Xi is the sum of at least n/2 but at 
most n independent Bernoulli variables each of which equals 1 with probability c/2\ Set iii = E\X^ € 
[cn/2 i+1 , 071/2% Note that m>\ for all i G [0, [lgn\]. 

For i < lg(cn) — lg(91nn) — 1, by Chernoff bound (O, Pr[JQ > 2fij] < exp(— /U,/3), which can be 
easily verified to be at most 1 / n 3 . Now, consider i > lg(cn) — lg(9 In n) — 1. Note that there are 0(lg lg n) 
such i. Furthermore, for every such i, it holds that 2* +1 > cn/(91nn), which means /ij = O(lgn). By 
Chernoff bound ©, when n > 4, Pr[Xi > (lgn 3 )^] < 2~^ n ^^ < 2" 1 s n3 = 1/n 3 . In other words, 
with probability at least 1 — 0(lglg n/n 3 ), the sum of Xi of all i > lg(cn) — lg(9mn) — 1 is at most 
0(lglgn • lg 2 n). 

The above discussion shows that, with probability at least 1 — 1/n 2 , X < (^[=o^ + 0(lglgn • 
lg 2 n) = 0(n). 

Appendix 4: Proof of Corollary Q] 

Top-k range reporting. Both count and max-score queries can be supported by a slightly augmented B- 
tree, whereas a r-reporting query can be answered using the external priority search tree of Arge, 
Samoladas and Vitter Q. All structures use linear space, answer a query in logarithmic I/Os (plus 
linear output cost), and can be updated in logarithmic I/Os per insertion and deletion. Using the 
extension explained in Section [2 the max-score B-tree also answers a top-constant-search query in 
logarithmic I/Os. 

Top-k stabbing. Count queries can again be supported with an augmented B-tree (e.g., |29l ). A max-score 
structure was given by Agarwal et al. 0, and (using the extension in Section |2]) can be modified to 
handle a top-constant-search query. A T-reporting query can be answered by the structure of Tao ll28l . 
All structures use linear space, answer a query in logarithmic I/Os (with linear output cost), and can 
be updated in logarithmic amortized I/Os. 

Top-k 3-sided range search. A count query can be answered by the linear-space structure of Govindarajan, 
Agarwal and Arge Ifl5l in logarithmic I/Os. A max-score query can be answered by the linear-space 
structure of Sheng and Tao |[26l in logarithmic I/Os. Their structure can be easily extended to support 
top-constant-score queries with the same performance guarantees. Finally, r-reporting is essentially 
the Q(3, 1) problem defined by Afshani, Arge and Larsen HI. They gave a structure with space 
and 0{\g B n + t/B) query time. 

The query and update cost in Corollary Q] immediately follows from Theorem Q] Regarding the space 
usage, it suffices to show that if S has n elements, the total size of all its sample sets maintained in our 
reduction is 0(n) with probability at least 1 — 1/n 2 . This can be proved following an argument similar to 
the proof of Theorem Q] on the update cost. 
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Appendix 5: Proof of Theorem |2] 



It is well-known llT8l that, for the 4-sided orthogonal range search problem in 2d space, a deterministic 
structure needs f2(j| lg '|^ - ) space in the worst case to ensure 0(\g c B n + k/B) query time, for any constant 
c. In 3d space, the Q(3, 1) problem (as defined in |U) is more general than 4-sided orthogonal range search, 
whose lower bound thus also applies to the Q(3,l) problem. Next, we show that if a deterministic structure 
using o(j| igi g ™ n ) space could answer a top-A; 3-sided range query in 0(\g c B n + k/B) I/Os, the same 
structure would also settle a Q(3, 1) query with 0(lg c B n + k/B) I/Os, thus reaching a contradiction. 

In the Q(3, 1) problem, the input is a set S of n points in M 3 . Given a rectangle q = [xi, x^] X [y, 00) x 
[z, 00), a query reports all the points in S n q. Suppose that we had an efficient structure using o(j| - ) 
space as mentioned above. We use it to index S by treating the z-coordinate of each point as its score. Then, 
we answer a Q(3, 1) query by a series of top-A;' queries with doubling k'. First, issue a top-A/ query with 
k! = B lg c B n and a 3-sided rectangle [x±, X2] X [y, 00). Let S' be the set of retrieved points. If the minimum 
score of the points in S' is below z, we have found all the points satisfying the Q(3, 1) query, and therefore, 
terminate the algorithm. Otherwise, we double k! and repeat. 

The cost of the i-th top-A;' query is 0(lg^ n + 2* _1 lg^ n) = 0{2 l ~ l lg c B n). Hence, the cost of all 
these queries is dominated by that of the last one. It is easy to see that the last top-A;' query has cost 

0{lg c B n + 2k/ B) = 0{\g% n + k/B). 

Appendix 6: Proof of Lemma |2] 

The space bound is obvious. If there are n updates, by Chernoff bound ((3]), the number of insertions on R is 
at most 2n/(lg B n) with probability at least 1 — 1/n 2 when n is larger than 256. There can be at most the 
same number of deletions on R. Hence, with probability at least 1 — 1/n 2 the number of updates on R is 
0(n/l g B n); as each update takes 0(lg B n) amortized I/Os, the total update cost is 0{n\g B n). 

Now we analyze the query algorithm, considering \S(q)\ > 4k (the opposite case is already taken care 
of at the beginning of Section|3]). For k > 7 In n • lg B n, kp > 7 In n which is at least 3 ln(3n 2 ) for n > 256. 
Applying Lemma[T]with 5 = 1/n 2 , we know that with probability at least 1 — 1/n 2 : (i) R has 2kp elements, 
and (ii) the element with rank \2kp~] in R has rank between k and 4k. When both conditions hold, our 
algorithm reports the top-A; elements in 0(lg B n + k/B) I/Os. Hence, with probability at least 1 — 1/n 2 , 
our algorithm terminates in 0(lg B n + k/B) I/Os. With probability at most 1/n 2 (i.e., at least one of (i) and 
(ii) is violated), our algorithm spends 0(n/B) I/Os answering the query, adding only \ ■ 0{n/B) = o(l) 
to the expected cost. 

Appendix 7: Proof of Lemma 3] 

We prove the lemma also with tabulation. Let us define the list of the B \g B (fl) largest elements of G{ (1 < 
i < /) as the prefix of Gj, and denote it as Pj. Let P be the union of Pi, .., Pf, we refer to P as a prefix set. 
P contains at most fB \g B (fl) points. We compress P by describing each element e (say, e G Gi for some 
i) in P using its global rank in G and its local rank in Gi, for which purpose 0(lg(//)) bits suffice. Hence, 
P can be described by 0(fB \g B (fl) • lg(//)) = 0(lg e n ■ lglg n ■ \g B lg n ■ lglg n) = 0(lg € n • (lglgn) 3 ) 
bits, which fits in a word. 

We ensure that a compressed P is always described by the same number of bits - for this purpose, we 
force each prefix to have Blg B (fl) elements by appending dummy elements as needed. In a compressed 
P, the elements are first sorted by which sets they come from, and then by their local ranks. In this way, the 
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global rank of the element with local rank r < B \g B fl in Gi can be found in constant time for any r and i. 
Let A be the number of possible compressed prefix sets, i.e., A = 2°( lg£ " ('g's™) ). 

Suppose that we need to delete an element e from G{. If e £ Pi, using a B-tree on G, we find the global 
rank r of e in 0(lg B (//)) I/Os. We observe that the new compressed prefix set P' is determined by the 
current compressed prefix set P, i and r. To see this, first consider a compressed prefix Py with il ^ i: if an 
element has global rank at least r, it should have its global rank decreased by 1. Regarding the compressed 
prefix Pi, the same is true; furthermore, all such elements in Pj should also have their local ranks decreased 
by 1. Finally, the last element of Pj is discarded (i.e., marked dummy). 

Motivated by the above observation, we precompute a table that contains an entry for every combination 
of (P, i, r), and stores in that entry the corresponding P' . The entries are sorted in the lexicographic order of 
the bit description of (P, i, r) to allow constant-time lookups (by that description). The number of entries is 
X- f ■ fBlg B (fl) = O(A-polylgn). Hence, the whole table contains 0(A-polylg n-lg € n-(lglg n) 3 ) = o(n) 
bits. Furthermore, as the P' in each cell can be easily computed in O(polylgn) time, the table can be 
generated in 0(A • polylgn) = o(n) time. 

Finally, we need to insert into Pj the element with score rank Blg B fl in Gj. Following the above 
ideas, in general, an insertion can also be handled in 0(lg B (fl)) I/Os by pre-computing another lookup 
table occupying o(n) bits. This table can also be generated in o(n) time. 

Excluding the pre-computed tables, the structure includes: (i) a B-tree on each Gi, (ii) a B-tree on G, 
and (iii) a compressed prefix set which fits in one word. The space is therefore linear. All the B-trees can be 
easily built in 0(fl ■ \g B {fl)) I/Os. By querying the B-tree on G, we can easily generate each Pj (including 
the local and global ranks of the elements) in 0(fl • lg B (fl)) I/Os, after which the compressed prefix set 
can be obtained in 0(fl/B) I/Os. 

Appendix 8: Proof of Lemma |5] 

We now explain how to handle a deletion. Suppose that an element e id is to be deleted in Gi for some 
i G [1, /]. Let r id be the rank of e id in G before the deletion. Except possibly for only one pivot, the new 
compressed sketch set can be deduced based only on the current compressed sketch set, r, i and the value of 
\Gi \ before the deletion. To see this, consider first £j/ where i' ^ i. Each pivot whose global rank is larger 
than raid now needs to have its global rank decreased by 1. Regarding Ej, the same is true, and every such 
pivot should also have its local rank decreased by 1 . Furthermore, the last pivot of Ej should be discarded 
(i.e., marked dummy) if |Gj| was a power of 2 before the deletion: in such a case, we say that Ej shrinks. 
Finally, if e ^ happens to be a pivot of Ej (but not the last one), then a new pivot needs to be computed to 
replace it - this is the only pivot that cannot be deduced; we call it a dangling pivot. 

We precompute a deletion lookup table Tdei- Let E denote a possible compressed sketch set. Tdei 
contains an entry for every possible combination of (S, r [d, i, \Gi\), where the meanings of r Q ^, i and \Gi\ 
are as explained above. This entry contains the new compressed sketch set determined by (E, r Q id, i, |Gj|) - 
in case there is a dangling pivot in the new sketch set, set its global rank with a special mark (which can be 
done by using one extra bit to represent each global rank). We sort the entries in the lexicographic order of 
the bit-strings of the corresponding (X!, r ^, i, \Gi\) to allow constant-time lookups. Just like Tj ns , an entry 
in Tdei is associated with a pointer to a linked list indicating the invalidated pivots of the sketch set stored in 
that entry. T^ei occupies the same space as Tj ns , and can be constructed with the same cost. 

The concrete steps of deleting e ^ are as follows. After obtaining its global rank r ^ in 0(lg B (fl)) 
I/Os, we look up Tdei to find the new compressed sketch set. If Ej shrinks, we delete the last pivot from 
the B-tree on (the uncompressed) E, in 0(lg# lg I) I/Os. If e D id was a pivot (say, the j-th one for some j), 
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we retrieve the element e with local rank |_§ ■ 2 J_1 J in Gi, and obtain its global rank r. This can be done 
using 0(lg B (ft)) I/Os in total. We then use e to replace in 0(lg B Igl) I/Os. Finally, re-compute the 
invalidated pivots (if any) in the same way as in insertion. As analyzed in Section [3^21 such re-computation 
increases the update cost by 0(lg B (fl)) amortized I/Os. 

Excluding the pre-computed tables, the structure includes: (i) a B-tree on G, (ii) a B-tree on each Gi, 
(iii) a B-tree on each Ej, and (iv) a compressed sketch set which fits in one word. The space is therefore 
linear. All the B-trees can be easily built in 0(fl ■ \g B (fl)) I/Os, after which the compressed prefix set can 
be obtained in O(flfB) I/Os. 

Appendix 9: Proof of Lemma |6] 

Case k > m. We will collect a set P of pivots, each of which is an element in some set Lj. The pivot 
collection is carried out in [lg c m] rounds. In the j-th round (1 < j < |~lg c m~|), [m/c- 7-1 ] sets of L\, L m 
are active, while the other sets are inactive. At the beginning, all of L\, L m are active. 

In round j £ [1, [lg c m]], we use the rank-operator to request an element with rank c^k/m in each of 
the active sets. Remember that the operator can return any element whose rank in an active set is at least 
c?kjm and less than c J+1 /c/m. Let P' be the set of elements fetched. Call each element in P' a marker, and 
assign it a weight \d>k/rn\ — \d>~ l k/rn\ = Q{c^k/m). Pick the \m/c?~\ largest markers in P' as pivots, 
and add them to P, among which the smallest is the cutoff pivot of this round. An active set remains active 
if its marker was added to P in this round, whereas the other active sets become inactive. 

After [lg c m] rounds, P has X^=i"^ = 0{m) pivots. We perform a weighted selection to find 

the largest element e E P, such that the total weight of all pivots greater than or equal to e is at least k. The 
algorithm terminates by returning e. 

Analysis. The algorithm finishes in 0(m ■ cost ran k) times is because the j-round takes 0(m/o ? ~ 1 ) time 
(i.e., geometrically decreasing with j), and the weighted selection of finding e from P takes 0{m/B) time. 
Next, we show that the rank of e in L falls in the range [k, c'k) for some constant d . 

In any Si (1 < i < m), the pivot taken a round must rank behind all those of earlier rounds, because the 
pivot of the j-th round has rank in [cPk/m, c^ +l k/m). Furthermore, the cutoff pivot of each round cannot 
be larger than e, because at each round j, we ensure that at least \m/c'~\ {cPk/m) > k elements in L are at 
least the cutoff pivot. Each Si has at least one marker smaller than or equal to e - refer to the largest such 
marker the succeeding marker of Si. Note that the marker may not necessarily be a pivot. 

The rest of the proof is similar to the one in lTT4ll . We sketch it here for completeness. If Si has at 
least one pivot at least e, let p-i be the smallest such pivot, and call Si a pivotal set. Let Si[e,pi) be the 
set of elements in Si that are at least e but smaller than pi, and Si\pi, oo) be the set of elements in S% that 
are at least pj. The size of Si[e,pi) is asymptotically no greater than that of Si\pi, oo). This is because (i) 
the elements in Si[e,pi) must fall between p% and the succeeding marker of Si, (ii) hence, |5j[e,pj)| is at 
most O(l) more than the weight of the succeeding marker, and (iii) the weight of the succeeding marker 
is 0(\Si\pi, oo)|). Hence, in the pivot sets, the total number of elements at least e is O(k). On the other 
hand, each non-pivotal set Si has less than c 2 k/m elements at least e. They altogether contribute less than 
c 2 k = O(k) such elements. 

Case k < m. From each Si, we use the max-operator to request the max element in Si. Let P' be the 
set of elements fetched (i.e., one from each S£). We identify the k largest elements in P'\ let e' be the fc-th 
largest element in P' . We turn a set Si inactive if its max element is smaller than e'. Run the above k > m 
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algorithm on the k remaining active sets. Suppose that the algorithm returns e. We return max{e, e'} as the 
final answer. It is easy to prove that the algorithm is correct, and runs in time 0(m(cost ran k + cost max )). 

Appendix 10: Proof of Theorem [3] 

We now complete the description in Section l3~4l of the structure for k = O(polylgn). This structure uses 
linear space, answers a query in 0(lg B n + kj ' B) space, and can be updated in 0(lg B n) I/Os. This, together 
with LemmaEl establishes the theorem. 

lg lg n > \fB. Let us continue the discussion of Section 13.41 To support updates, for each internal node 
u, build another B-tree on the scores of the elements in G u . Similarly, for each leaf node z, build a B-tree 
on the scores of the elements in S z . Refer to these B-trees as score B-trees. 

To insert a point e in S, we first descend a root-to-leaf path it to the leaf node z where e should be 
stored. Remember that ir has O(lg^n) nodes. At z, update all its secondary structures in 0(\g B b) = 
0((lg B lg n) 2 ) = 0(lg B n) I/Os. Next, we fix the secondary structures of the nodes along 7r in bottom-up 
order. Let parent(z) be the parent node of z. If e enters G z , at parent(z), delete the element in G z with 
the lowest score, and insert e in G z . Accordingly, the secondary structures of parent(z) are updated. In 
general, after updating an internal node u, we check using the score B-tree of u whether e enters G u . If 
yes, at parent(u), delete the element in G u with the lowest score, insert e in G u , and update the secondary 
structures of parentiu). By LemmaO we spend 0(lg B (fl)) I/Os at each node, and hence, Oi\g B n) I/Os 
in total along the whole tt. 

We now describe how to handle node splits. Suppose that a leaf node z splits into z±, z%. First, build the 
secondary structures of z\ and z<i in 0{b\g 2 B b) I/Os. At v = parent(z), destroy G z , and include G Z1 and 
G Z2 into G v . Rebuild all the secondary structures at v in 0{fl ■ lg B (fl)) = 0{b\g B b) I/Os (Lemma[5]>. 
This cost can be amortized over the SI (6) updates that must have taken place in z, such that each update is 
charged only 0(\g 2 B b) = 0(lg B n) I/Os. 

A split at an internal level can be handled in a similar way. Suppose that an internal node u splits into 
u\, U2- Divide G u into G Ul and G U2 in O(flfB) I/Os, and then rebuild the secondary structures of u±, u% 
in 0(fl • lg B (fl)) I/Os. After discarding G u but including G Ul , G U2 , we rebuild the secondary structures of 
parentiu) in 0(fl ■ lg B (fl)) I/Os. On the other hand, £l(fl) updates must have taken place in the subtree 
of u (recall that the base tree is a weight balanced B-tree). Hence, each of those updates bears 0(\g B {fl)) 
I/Os for the split cost. As an update bears such cost for at most one node per level, the amortized update 
cost increases by only 0(lg# n). 

An analogous algorithm can be used to handle a deletion in 0(lg B n) amortized I/Os. The details should 
have become straightforward. 

Regarding space consumption, there are 0(n/(fb)) internal nodes, each of which occupies 0(fl/B) 
blocks. Hence, all the internal nodes use 0(j| • ^) = O(jpr) space in total. The overall space cost is 
therefore 0(n/B). Global rebuilding is performed to ensure that the precomputed lookup tables always 
consume 0(n/B) space. Specifically, after n has changed by a factor of 2, we destroy the entire structure, 
and rebuild everything (including the lookup tables) in 0{n\g B n) I/Os, making sure that the lookup tables 
can be used until the input size increases to 2n. 

lg lg n < -\/B. In this case, the base tree T is a weight balanced B-tree on the points of S where an internal 
node has at most / = \^B child nodes, and a leaf node stores at most b = flB elements. For each node u 
of T, define S u and G u as before. At each leaf node z, store S z in a structure of |[27l for approximate range 
fc-selection. Consider an internal node u with child nodes ui,...,ut (ordered in the same way as explained 
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in Section [3~4l >. For each G Ui , create a logarithmic sketch S Ui , which has 0(lg /) = 0(lg lg n) pivots. Since 
lglgn < \fB, all the E U1 , T> Uf can be stored in 0(1) blocks. Use another block to store the element 
with the maximum score in each G Ui . 

The height of the tree is 0(lg B n). A top-/c range query is processed in the same manner as explained 
in Section l3~4l except that, when applying Lemma[6l both the max- and rank-operators at each internal node 
can now be implemented in 0(1) I/Os. The query cost is therefore 0(lg B n). 

The structure clearly occupies linear space. It can be made dynamic using the strategies illustrated earlier 
for the lglgn > \f~B case, so that an insertion/deletion is performed in 0(lg B n) amortized I/Os. We omit 
the tedious details which should have become straightforward. 
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